N-Gram-Based Statistical Machine Translation versus Syntax Augmented Machine Translation: Comparison and System Combination

نویسندگان

  • Maxim Khalilov
  • José A. R. Fonollosa
چکیده

In this paper we compare and contrast two approaches to Machine Translation (MT): the CMU-UKA Syntax Augmented Machine Translation system (SAMT) and UPC-TALP N-gram-based Statistical Machine Translation (SMT). SAMT is a hierarchical syntax-driven translation system underlain by a phrase-based model and a target part parse tree. In N-gram-based SMT, the translation process is based on bilingual units related to word-to-word alignment and statistical modeling of the bilingual context following a maximumentropy framework. We provide a stepby-step comparison of the systems and report results in terms of automatic evaluation metrics and required computational resources for a smaller Arabic-to-English translation task (1.5M tokens in the training corpus). Human error analysis clarifies advantages and disadvantages of the systems under consideration. Finally, we combine the output of both systems to yield significant improvements in translation quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparación y combinación de los sistemas de traducción automática basados en n-gramas y en sintaxis

In this paper we shall compare two approaches to machine translation: the Syntax Augmented Machine Translation system (SAMT), which is a syntaxdriven translation system, underlain by phrase-based model, and the n-gram-based Statistical Machine Translation (SMT), in which a translation process is based on statistical modeling of the bilingual context. We provide a step-by-step comparison of the ...

متن کامل

A New Subtree-Transfer Approach to Syntax-Based Reordering for Statistical Machine Translation

In this paper we address the problem of translating between languages with word order disparity. The idea of augmenting statistical machine translation (SMT) by using a syntax-based reordering step prior to translation, proposed in recent years, has been quite successful in improving translation quality. We present a new technique for extracting syntax-based reordering rules, which are derived ...

متن کامل

The ISL Phrase-Based MT System for the 2007 ACL Workshop on Statistical Machine Translation

In this paper we describe the Interactive Systems Laboratories (ISL) phrase-based machine translation system used in the shared task ”Machine Translation for European Languages” of the ACL 2007 Workshop on Statistical Machine Translation. We present results for a system combination of the ISL syntax-augmented MT system and the ISL phrase-based system by combining and rescoring the n-best lists ...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

The Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation

We describe the CMU-UKA Syntax Augmented Machine Translation system ‘SAMT’ used for the shared task “Machine Translation for European Languages” at the ACL 2007 Workshop on Statistical Machine Translation. Following an overview of syntax augmented machine translation, we describe parameters for components in our open-source SAMT toolkit that were used to generate translation results for the Spa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009